feat: cluster autoscaler #251

cbzzz · 2024-04-15T14:39:46Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds a new cluster-autoscaler flavor that provides an add-on to autoscale workload cluster nodes via Cluster Autoscaler.

Testing

In addition to your workload cluster environment variables, set up the new autoscaling variables:

$ export CLUSTER_AUTOSCALER_VERSION=v1.29.0
# Optional: If specified, these values must be explicitly quoted!
$ export WORKER_MACHINE_MIN='"1"'
$ export WORKER_MACHINE_MAX='"10"'

Create a cluster using the Cluster Autoscaler flavor:

$ clusterctl generate cluster ${CLUSTER_NAME} \
  --infrastructure linode:0.0.0 \
  --flavor cluster-autoscaler \
  | kubectl apply -f -

When the Cluster is Ready, download the kubeconfig file and deploy any workload.

$ kubectl get secret ${CLUSTER_NAME}-kubeconfig -o jsonpath='{.data.value}' \
  | base64 -d - > /tmp/${CLUSTER_NAME}-kubeconfig

# Example workload
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 0
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          requests:
            memory: 1Gi
            cpu: 1000m
EOF

Scale the workload beyond the workload cluster's capacity to trigger a scale up event.

$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl scale deployment/nginx --replicas 2

The Cluster Autoscaler should scale up any detected MachineSets, MachineDeployments, or MachinePools to meet scheduling requirements.

# Autoscaler logs on management cluster
$ kubectl logs deploy/${CLUSTER_NAME}-cluster-autoscaler
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.581700       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 1)
...
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582547       1 orchestrator.go:185] Estimated 1 nodes needed in MachineDeployment/default/cbzzz-capl-md-0
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582577       1 orchestrator.go:291] Final scale-up plan: [{MachineDeployment/default/cbzzz-capl-md-0 1->2 (max: 10)}]
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582628       1 executor.go:147] Scale-up: setting group MachineDeployment/default/cbzzz-capl-md-0 size to 2

# Node events on workload cluster
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl events --for node/
6m5s                    Warning   FailedScheduling          Pod/nginx-5b656d96b5-vhwxc            0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
5m53s                   Normal    TriggeredScaleUp          Pod/nginx-5b656d96b5-vhwxc            pod triggered scale-up: [{MachineDeployment/default/cbzzz-capl-md-0 1->2 (max: 10)}]

Delete the workload:

$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl delete deployment/nginx

The Cluster Autoscaler should scale down any associated MachineSets, MachineDeployments, or MachinePools. You may need to restart the Cluster Autoscaler with the --scale-down-unneeded-time=1s setting for a quicker reaction time.

# Autoscaler logs on management cluster
$ kubectl logs deploy/${CLUSTER_NAME}-cluster-autoscaler
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.258406       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 2)
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.258753       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-tgrpd" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259341       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-dtld8" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259642       1 klogx.go:87] Node cbzzz-capl-md-0-2skxq-dtld8 - cpu requested is 5% of allocatable
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259665       1 eligibility.go:104] Scale-down calculation: ignoring 1 nodes unremovable in the last 5m0s
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259683       1 cluster.go:156] Simulating node cbzzz-capl-md-0-2skxq-dtld8 removal
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259719       1 cluster.go:174] node cbzzz-capl-md-0-2skxq-dtld8 may be removed
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259733       1 nodes.go:84] cbzzz-capl-md-0-2skxq-dtld8 is unneeded since 2024-04-17 14:19:35.535959819 +0000 UTC m=+623.701969197 duration 22.916044912s
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.260165       1 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2024-04-17 14:10:24.375866728 +0000 UTC m=+72.541876099 lastScaleDownDeleteTime=2024-04-17 13:09:28.327657656 +0000 UTC m=-3583.506332776 lastScaleDownFailTime=2024-04-17 13:09:28.327657656 +0000 UTC m=-3583.506332776 scaleDownForbidden=false scaleDownInCooldown=true
...
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:34.837762       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 2)
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.871307       1 drain.go:131] All pods removed from cbzzz-capl-md-0-2skxq-dtld8
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.876928       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-dtld8" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.920265       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"7a7668d7-646d-449c-b5a7-49dcc4f96aac", APIVersion:"v1", ResourceVersion:"6415", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: empty node cbzzz-capl-md-0-2skxq-dtld8 removed

# Node events on workload cluster
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl events --for node/
6m10s               Normal    ScaleDown                 Node/cbzzz-capl-md-0-2skxq-dtld8      marked the node as toBeDeleted/unschedulable
6m1s                Normal    RemovingNode              Node/cbzzz-capl-md-0-2skxq-dtld8      Node cbzzz-capl-md-0-2skxq-dtld8 event: Removing Node cbzzz-capl-md-0-2skxq-dtld8 from Controller

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Due to constraints with the Kubernetes RBAC system (i.e. roles cannot be subdivided beyond namespace-granularity), the Cluster Autoscaler add-on is deployed on the management cluster to prevent leaking Cluster API data between workload clusters.
Currently, the Cluster Autoscaler reuses the ${CLUSTER_NAME}-kubeconfig Secret generated by the bootstrap provider to interact with the workload cluster. The kubeconfig contents must be stored in a key named value. Due to this, all Cluster Autoscaler actions in the workload cluster are performed as the cluster-admin role (and might be insecure idk 🙈).

See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster

TODOs:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

codecov · 2024-04-15T14:42:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.83%. Comparing base (1b0a785) to head (0f13cea).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #251   +/-   ##
=======================================
  Coverage   53.83%   53.83%           
=======================================
  Files          27       27           
  Lines        1566     1566           
=======================================
  Hits          843      843           
  Misses        673      673           
  Partials       50       50

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

templates/flavors/cluster-autoscaler/kustomization.yaml

eljohnson92 · 2024-04-15T15:57:13Z

docs/src/topics/flavors/cluster-autoscaler.md

+1. Set up autoscaling environment variables
+    ```sh
+    export CLUSTER_AUTOSCALER_VERSION=v1.29.0
+    export WORKER_MACHINE_MIN="\"1\""


could we move this string escaping into the template instead of the variable?

I don't think escaping should be needed, it's already quoted in the template

I can give this another shot, but envsubst didn't play well with quoting numbers as strings in the templating. It was either stripping them too aggressively or not enough.

The issue here seems to be how clusterctl generate generates templates in conjunction with envsubst. For what I can see, it:

Renders the templates and then validates the Kubernetes YAML

Passes the generated YAML through envsubst

envsubst substitutes both ${var} and "${var}" so you need to explicitly specify the shell variable as a "string".

I've slightly modified the documentation commands to remove the escapes and also provided defaults for these values.

templates/flavors/cluster-autoscaler/kustomization.yaml

templates/addons/cluster-autoscaler/workload-rbac.yaml

docs/src/topics/flavors/cluster-autoscaler.md

AshleyDumaine

LGTM, I can't figure out a better workaround at the time for the scaling annotations 🤔

Adds a new cluster-autoscaler flavor that provides an autoscaling add-on for workload cluster nodes via [Cluster Autoscaler](https://www.github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler). Due to constraints with the Kubernetes RBAC system (i.e. [roles cannot be subdivided beyond namespace-granularity](https://www.github.com/kubernetes/kubernetes/issues/56582)), the Cluster Autoscaler add-on is deployed on the management cluster to prevent leaking Cluster API data between workload clusters. Currently, the Cluster Autoscaler reuses the `${CLUSTER_NAME}-kubeconfig` Secret generated by the bootstrap provider to interact with the workload cluster. The kubeconfig contents must be stored in a key named `value`. Due to this, all Cluster Autoscaler actions in the workload cluster are performed as the `cluster-admin` role. See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster

* feat: add cluster autoscaler flavor Adds a new cluster-autoscaler flavor that provides an autoscaling add-on for workload cluster nodes via [Cluster Autoscaler](https://www.github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler). Due to constraints with the Kubernetes RBAC system (i.e. [roles cannot be subdivided beyond namespace-granularity](https://www.github.com/kubernetes/kubernetes/issues/56582)), the Cluster Autoscaler add-on is deployed on the management cluster to prevent leaking Cluster API data between workload clusters. Currently, the Cluster Autoscaler reuses the `${CLUSTER_NAME}-kubeconfig` Secret generated by the bootstrap provider to interact with the workload cluster. The kubeconfig contents must be stored in a key named `value`. Due to this, all Cluster Autoscaler actions in the workload cluster are performed as the `cluster-admin` role. See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster * docs: add cluster autoscaler flavor

AshleyDumaine self-requested a review April 15, 2024 14:47

AshleyDumaine reviewed Apr 15, 2024

View reviewed changes

templates/flavors/cluster-autoscaler/kustomization.yaml Show resolved Hide resolved

AshleyDumaine added the feature New feature or request label Apr 15, 2024

eljohnson92 reviewed Apr 15, 2024

View reviewed changes

templates/flavors/cluster-autoscaler/kustomization.yaml Outdated Show resolved Hide resolved

eljohnson92 reviewed Apr 15, 2024

View reviewed changes

templates/addons/cluster-autoscaler/workload-rbac.yaml Outdated Show resolved Hide resolved

AshleyDumaine reviewed Apr 15, 2024

View reviewed changes

docs/src/topics/flavors/cluster-autoscaler.md Show resolved Hide resolved

cbzzz force-pushed the feat.autoscaling branch from 8370f61 to 8663c0e Compare April 17, 2024 16:51

cbzzz marked this pull request as ready for review April 17, 2024 16:51

cbzzz force-pushed the feat.autoscaling branch from 8663c0e to 0cfe256 Compare April 17, 2024 16:56

AshleyDumaine reviewed Apr 17, 2024

View reviewed changes

docs/src/topics/flavors/cluster-autoscaler.md Outdated Show resolved Hide resolved

cbzzz force-pushed the feat.autoscaling branch 2 times, most recently from 8e75985 to 1c70a57 Compare April 17, 2024 18:16

AshleyDumaine approved these changes Apr 17, 2024

View reviewed changes

eljohnson92 approved these changes Apr 17, 2024

View reviewed changes

cbzzz added 2 commits April 18, 2024 10:07

docs: add cluster autoscaler flavor

0f13cea

cbzzz force-pushed the feat.autoscaling branch from 1c70a57 to 0f13cea Compare April 18, 2024 14:08

cbzzz merged commit e9dc9fa into main Apr 18, 2024
9 checks passed

cbzzz deleted the feat.autoscaling branch April 18, 2024 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cluster autoscaler #251

feat: cluster autoscaler #251

cbzzz commented Apr 15, 2024 •

edited

Loading

codecov bot commented Apr 15, 2024 •

edited

Loading

eljohnson92 Apr 15, 2024

AshleyDumaine Apr 15, 2024

cbzzz Apr 15, 2024

cbzzz Apr 17, 2024 •

edited

Loading

AshleyDumaine left a comment

feat: cluster autoscaler #251

feat: cluster autoscaler #251

Conversation

cbzzz commented Apr 15, 2024 • edited Loading

codecov bot commented Apr 15, 2024 • edited Loading

Codecov Report

eljohnson92 Apr 15, 2024

Choose a reason for hiding this comment

AshleyDumaine Apr 15, 2024

Choose a reason for hiding this comment

cbzzz Apr 15, 2024

Choose a reason for hiding this comment

cbzzz Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

AshleyDumaine left a comment

Choose a reason for hiding this comment

cbzzz commented Apr 15, 2024 •

edited

Loading

codecov bot commented Apr 15, 2024 •

edited

Loading

cbzzz Apr 17, 2024 •

edited

Loading